Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: state sync #1026

Open
wants to merge 71 commits into
base: v1.5-dev
Choose a base branch
from
Open

feat!: state sync #1026

wants to merge 71 commits into from

Conversation

lklimek
Copy link
Collaborator

@lklimek lklimek commented Jan 15, 2025

Issue being fixed or feature implemented

What was done?

  • added new call, FinalizeSnapshot, that is sent to ABCI App once state sync is finished
  • fixed bug in QuorumVerify call
  • updated e2e tests to test state sync
  • removed unused config settings
  • on statesync timeout, fall back to block sync
  • added config settings to [statesync]: retries
  • added safe integer type casting code

How Has This Been Tested?

Updated e2e tests, updated unit test.

Breaking Changes

  • removed unused configuration settings from [statesync] section: trust-height, trust-hash, trust-period
  • added new ABCI method FinalizeSnapshot
  • bumped ABCI protocol version to 1.3.0

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation

For repository code-owners and collaborators only

  • I have assigned this pull request to a milestone

Summary by CodeRabbit

  • New Features

    • Introduced snapshot finalization support, enabling improved state synchronization via new snapshot finalization endpoints.
  • Configuration

    • Replaced legacy trust-based state sync parameters with a new "Retries" setting.
    • Updated network configuration to use peer-to-peer synchronization.
  • Dependency & Version Updates

    • Upgraded several dependencies and advanced the ABCI library version from 1.2.0 to 1.3.0.
  • Documentation

    • Enhanced API and protocol documentation to cover the new snapshot finalization functionality.

Copy link
Contributor

coderabbitai bot commented Jan 15, 2025

Walkthrough

This pull request introduces a new FinalizeSnapshot functionality across multiple layers of the application. The changes add the FinalizeSnapshot method to various client implementations, mocks, and the ABCI application. Alongside this feature, state synchronization has been updated by replacing trust parameters with a retry mechanism. Additional improvements include dependency updates, enhanced safe integer conversions, logging adjustments, and numerous test and documentation modifications across state sync, consensus, and P2P components.

Changes

File(s) Change Summary
abci/client/{grpc_client.go, mocks/client.go, routed_client.go, socket_client.go}, internal/proxy/client.go Added new FinalizeSnapshot method in client implementations and mocks; integrated conversion via ToRequestFinalizeSnapshot and updated request handling.
abci/example/kvstore/kvstore.go, abci/types/{application.go, messages.go, mocks/application.go}, proto/tendermint/abci/types.proto, spec/abci++/api.md Introduced snapshot finalization on the application side with verification checks, added new request/response messages, and updated interfaces for snapshot finalization.
internal/statesync/{chunks.go, reactor.go, stateprovider.go, syncer.go, syncer_test.go, reactor_test.go, mocks/stateprovider.go} Enhanced state sync logic: added LightBlock and finalizeSnapshot methods, updated chunk management, refined retry and peer handling, and adjusted tests to support the new flow.
config/{config.go, toml.go}, test/e2e/{networks/rotate.toml, runner/setup.go, runner/start.go} Modified state sync configuration by removing trust parameters and adding a Retries field; updated state_sync settings to "p2p" in test configs and removed obsolete config update functions.
go.mod, version/version.go Updated several dependencies and bumped ABCISemVer from "1.2.0" to "1.3.0".
libs/math/{safemath.go, safemath_test.go} Added robust safe conversion functions (with overflow checks), additional conversion helpers, and corresponding tests.
internal/{blocksync/reactor.go, evidence/reactor.go, mempool/reactor.go, statesync/peer.go, p2p/transport_mconn.go} Adjusted logging verbosity (from Debug/Info to Trace) in various reactor modules and changed the default connection port in MConn transport from 26657 to 26656.
types/quorum_sign_data.go, internal/consensus/state_data.go, types/validator_test.go, test/e2e/pkg/mockcoreserver/{core_server.go, methods.go} Minor corrections such as updating quorum hash length validation, fixing comment typos, and enhancing validator hash tests and mock core server behavior.
.mockery.yaml, internal/{consensus/state.go, libs/sync/mutexguard.go, statesync/mocks/consensusstateprovider.go} Introduced the ConsensusStateProvider interface and its corresponding mock, added a GetCurrentHeight method in consensus state, and provided utility functions for safe mutex handling.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client (gRPC/Socket)
    participant P as Proxy Client
    participant R as Routed Client
    participant S as Socket Client
    participant A as ABCI Application
    C->>P: FinalizeSnapshot(ctx, req)
    P->>R: Delegate FinalizeSnapshot request
    R->>S: Forward FinalizeSnapshot call
    S->>A: Call FinalizeSnapshot (with ToRequestFinalizeSnapshot)
    A-->>S: Return ResponseFinalizeSnapshot (+ error)
    S-->>R: Pass back response
    R-->>P: Relay response
    P-->>C: Return ResponseFinalizeSnapshot
Loading

Poem

I'm a happy little bunny with code so bright,
Hoping through snapshots from morning till night.
FinalizeSnapshot now joins the game,
With safe math and sync, our code earns its fame.
Logs and tests dance like leaves in the breeze—
A joyful hop for a code that surely frees!
🐰✨

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@lklimek
Copy link
Collaborator Author

lklimek commented Jan 20, 2025

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Jan 20, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
abci/types/application.go (1)

102-104: Consider adding documentation for the base implementation.

While the empty implementation is appropriate for a base class, adding a comment explaining that this is a default no-op implementation would improve clarity.

+// FinalizeSnapshot provides a no-op implementation for the StateSyncer interface
 func (BaseApplication) FinalizeSnapshot(_ context.Context, _req *RequestFinalizeSnapshot) (*ResponseFinalizeSnapshot, error) {
   return &ResponseFinalizeSnapshot{}, nil
 }
proto/tendermint/abci/types.proto (1)

642-644: Consider adding status field to ResponseFinalizeSnapshot.

The empty response provides no feedback mechanism for the application to indicate success/failure of the finalization process.

Consider adding a status field:

message ResponseFinalizeSnapshot {
+  bool success = 1;  // Indicates if snapshot finalization was successful
+  string error = 2;  // Error message if finalization failed
}
abci/example/kvstore/kvstore.go (1)

583-598: Implementation looks good, but error messages could be more descriptive.

The FinalizeSnapshot implementation correctly verifies both height and app hash, with proper thread safety using mutex. However, the error messages could be more informative.

Consider enhancing the error messages with more details:

-    return &abci.ResponseFinalizeSnapshot{}, fmt.Errorf("snapshot height mismatch")
+    return &abci.ResponseFinalizeSnapshot{}, fmt.Errorf("snapshot height mismatch: expected %d, got %d", 
+        app.LastCommittedState.GetHeight(), req.LightBlock.SignedHeader.Header.Height)

-    return &abci.ResponseFinalizeSnapshot{}, fmt.Errorf("snapshot apphash mismatch")
+    return &abci.ResponseFinalizeSnapshot{}, fmt.Errorf("snapshot apphash mismatch: expected %x, got %x",
+        app.LastCommittedState.GetAppHash(), req.LightBlock.SignedHeader.Header.AppHash)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6890a7 and e84e0be.

⛔ Files ignored due to path filters (2)
  • abci/types/types.pb.go is excluded by !**/*.pb.go
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (30)
  • abci/client/grpc_client.go (1 hunks)
  • abci/client/mocks/client.go (1 hunks)
  • abci/client/routed_client.go (1 hunks)
  • abci/client/socket_client.go (1 hunks)
  • abci/example/kvstore/kvstore.go (1 hunks)
  • abci/types/application.go (2 hunks)
  • abci/types/messages.go (1 hunks)
  • abci/types/mocks/application.go (1 hunks)
  • config/config.go (0 hunks)
  • config/toml.go (0 hunks)
  • go.mod (3 hunks)
  • internal/proxy/client.go (1 hunks)
  • internal/statesync/chunks.go (2 hunks)
  • internal/statesync/mocks/stateprovider.go (1 hunks)
  • internal/statesync/reactor.go (2 hunks)
  • internal/statesync/reactor_test.go (1 hunks)
  • internal/statesync/stateprovider.go (5 hunks)
  • internal/statesync/syncer.go (4 hunks)
  • internal/statesync/syncer_test.go (4 hunks)
  • light/client.go (1 hunks)
  • light/client_test.go (1 hunks)
  • proto/tendermint/abci/types.proto (5 hunks)
  • spec/abci++/api.md (7 hunks)
  • test/e2e/networks/rotate.toml (2 hunks)
  • test/e2e/pkg/mockcoreserver/core_server.go (1 hunks)
  • test/e2e/pkg/mockcoreserver/methods.go (1 hunks)
  • test/e2e/runner/setup.go (0 hunks)
  • test/e2e/runner/start.go (1 hunks)
  • types/quorum_sign_data.go (1 hunks)
  • version/version.go (1 hunks)
💤 Files with no reviewable changes (3)
  • test/e2e/runner/setup.go
  • config/toml.go
  • config/config.go
✅ Files skipped from review due to trivial changes (2)
  • light/client_test.go
  • light/client.go
🧰 Additional context used
🪛 Markdownlint (0.37.0)
spec/abci++/api.md

431-431: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)


439-439: null
Link fragments should be valid

(MD051, link-fragments)


945-945: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: tests (02)
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: e2e-test (dashcore)
🔇 Additional comments (31)
version/version.go (1)

14-14: ABCI version bump aligns with new functionality.

The version update from 1.2.0 to 1.3.0 correctly follows semantic versioning for introducing the new FinalizeSnapshot method.

Let's verify this version is consistently used across the codebase:

✅ Verification successful

ABCI version update is consistently implemented.

The version bump from 1.2.0 to 1.3.0 is properly defined and consistently used across the codebase through the centralized version.ABCIVersion constant. No conflicting version strings were found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any hardcoded ABCI version strings
rg -g '!version/version.go' '"1\.[0-9]\.[0-9]"' 

Length of output: 48492


Script:

#!/bin/bash
# Search for ABCI version references more specifically
rg -i "abci.*version|version.*abci"

# Also check for any proto definitions that might reference ABCI versions
fd -e proto | xargs rg -i "version|abci"

Length of output: 37694

test/e2e/runner/start.go (1)

91-91: Simplified error handling looks good.

The removal of intermediate block variables and direct error checking from waitForHeight aligns with the removal of trust parameters from state sync configuration.

test/e2e/pkg/mockcoreserver/methods.go (1)

68-68: Correctly includes QuorumHash in unmarshal parameters.

The addition of &cmd.QuorumHash ensures proper unmarshalling of the quorum verification request.

abci/types/messages.go (1)

90-94: New ToRequestFinalizeSnapshot function follows established patterns.

The implementation correctly converts RequestFinalizeSnapshot to the generic Request type, maintaining consistency with other message conversion functions in the file.

abci/types/application.go (1)

19-20: LGTM! Well-documented interface addition.

The new FinalizeSnapshot method in the StateSyncer interface is clearly documented and aligns with the state sync enhancement objectives.

test/e2e/pkg/mockcoreserver/core_server.go (1)

149-150: LGTM! Good simplification of return value.

The change simplifies the code by directly returning the boolean value instead of wrapping it in a struct, while maintaining the same functionality.

internal/statesync/mocks/stateprovider.go (1)

148-205: LGTM! Correctly generated mock implementation.

The new LightBlock mock method follows the established patterns for error handling and return values, properly supporting the state sync enhancement.

types/quorum_sign_data.go (1)

164-164: Verify the impact of changing the hash size constant.

The validation now uses crypto.QuorumHashSize instead of crypto.DefaultHashSize. Please ensure:

  1. The new constant exists and has the correct value
  2. All existing quorum hashes comply with the new size requirement
✅ Verification successful

QuorumHashSize constant is correctly defined and used consistently

The validation now uses the semantically correct crypto.QuorumHashSize constant which equals crypto.DefaultHashSize (32 bytes). The codebase consistently validates quorum hashes against this size through multiple validation checks, and test files demonstrate correct usage of the constant.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and usage of QuorumHashSize

# Check if QuorumHashSize is defined
rg "QuorumHashSize.*=.*\d+" -A 2

# Find all quorum hash declarations to verify size compatibility
ast-grep --pattern 'QuorumHash []byte'

Length of output: 75


Script:

#!/bin/bash
# Broader search for hash size constants and usages

# Find QuorumHashSize constant definition and usage
rg "QuorumHashSize" -B 2 -A 2

# Find DefaultHashSize for comparison
rg "DefaultHashSize" -B 2 -A 2

# Find quorum hash related code with more context
rg "QuorumHash" -B 2 -A 2

# Look specifically in crypto package
fd -t f . "crypto" --exec cat {}

Length of output: 80387

internal/proxy/client.go (1)

215-218: LGTM! Implementation follows established patterns.

The FinalizeSnapshot method is well-implemented, following the same patterns as other methods in the file:

  • Consistent error handling
  • Proper timing metrics
  • Correct delegation to the underlying client
abci/client/grpc_client.go (1)

258-260: LGTM! Implementation follows established patterns.

The FinalizeSnapshot method is well-implemented, following the same patterns as other methods in the file:

  • Proper request conversion using ToRequestFinalizeSnapshot
  • Consistent use of grpc.WaitForReady(true) option
  • Correct error handling
internal/statesync/chunks.go (2)

136-144: LGTM! Improved handling of empty chunks.

The changes improve the robustness of chunk handling:

  • Maintains essential nil check for the chunk itself
  • Safely handles empty chunk content by replacing nil with empty byte slice
  • Prevents potential nil pointer dereference while allowing valid empty chunks

164-164: LGTM! Consistent use of modified data.

Correctly uses the potentially modified data variable for writing, ensuring consistency with the empty chunk handling changes.

abci/client/routed_client.go (1)

354-357: LGTM! Implementation follows established patterns.

The FinalizeSnapshot method is well-implemented, following the same patterns as other methods in the file:

  • Consistent use of delegate pattern
  • Proper type assertion for response
  • Correct error handling
internal/statesync/stateprovider.go (3)

43-44: LGTM: Well-structured interface addition

The new LightBlock method follows the existing interface pattern and is properly documented.


126-131: LGTM: Thread-safe implementation of LightBlock

The implementation correctly reuses the existing verifyLightBlockAtHeight method and maintains thread safety through proper mutex usage.


271-276: LGTM: Consistent implementation across providers

The P2P implementation maintains consistency with the RPC implementation, ensuring proper thread safety and error handling.

internal/statesync/syncer.go (2)

353-354: LGTM: Enhanced sync process with light blocks

The change improves the sync process by retrieving complete light blocks instead of just commits, providing more comprehensive block data for verification.


581-599: LGTM: Robust snapshot finalization implementation

The implementation properly handles snapshot finalization by sending light blocks to the ABCI app and includes appropriate logging.

abci/types/mocks/application.go (1)

Line range hint 261-422: LGTM! The new FinalizeSnapshot mock implementation follows established patterns.

The implementation correctly handles:

  • Error cases with appropriate panic messages
  • Type assertions and conversions
  • Method signatures matching the interface
  • Supporting mock types for testing
internal/statesync/reactor_test.go (1)

239-240: LGTM! Mock expectation for FinalizeSnapshot is correctly configured.

The mock is set up to be called exactly once and return an empty response, which is appropriate for this test scenario.

internal/statesync/reactor.go (1)

239-240: LGTM! Trust-related configuration has been cleanly removed.

The changes consistently remove trust-related parameters from both P2P and RPC state provider initialization while maintaining proper logging.

Also applies to: 252-253, 261-261

abci/client/mocks/client.go (1)

365-422: LGTM! The new FinalizeSnapshot mock implementation follows established patterns.

The implementation correctly handles:

  • Error cases with appropriate panic messages
  • Type assertions and conversions
  • Method signatures matching the interface
  • Supporting mock types for testing
test/e2e/networks/rotate.toml (1)

155-155: LGTM! State sync configuration changes look good.

The changes consistently configure p2p-based state sync for both validator06 and full01 nodes, aligning with the removal of trust-based parameters.

Also applies to: 195-195

abci/client/socket_client.go (1)

360-366: LGTM! Clean implementation of FinalizeSnapshot method.

The implementation follows the established pattern for ABCI methods in the socket client, with proper error handling and request/response flow.

go.mod (1)

12-13: Verify compatibility of updated dependencies.

Multiple dependency updates have been made. Please ensure:

  1. The new version of dashd-go (v0.26.1-0.20250113154619-015ef93ad36e) is stable and compatible
  2. The btcec/v2 v2.2.0 update doesn't introduce breaking changes
  3. The upgrade to stable versions (btclog v1.0.0) is intentional

Also applies to: 83-83, 111-111

✅ Verification successful

Dependencies update appears safe to proceed

The verification shows these are indirect dependencies with no direct usage in the codebase. The version updates follow semantic versioning principles, suggesting backward compatibility. While breaking changes exist in dashd-go, they don't appear to affect this codebase as there's no direct usage of the updated components.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any known issues or breaking changes in the updated dependencies

# Check dashd-go changelog/releases
gh api repos/dashpay/dashd-go/releases | jq -r '.[].body' | grep -i "breaking change\|security"

# Check btcec changelog
gh api repos/dashpay/dashd-go/contents/btcec/v2/CHANGELOG.md | jq -r '.content' | base64 -d | grep -A 5 "## v2.2.0"

Length of output: 356


Script:

#!/bin/bash
# Search for import patterns and potential breaking changes
echo "=== Checking import patterns ==="
rg -l "github.com/dashpay/dashd-go/btcec/v2" --type go

echo -e "\n=== Checking documentation ==="
fd -e md -e txt . | xargs rg -l "dashd-go|btcec" || true

echo -e "\n=== Checking recent changes ==="
git log -n 5 --oneline -- go.mod

echo -e "\n=== Checking current usages ==="
rg "btcec\..*" --type go -A 2

Length of output: 454


Script:

#!/bin/bash
# Check for indirect dependencies and upgrade guides
echo "=== Checking go.mod for indirect dependencies ==="
cat go.mod | grep -A 10 "require"

echo -e "\n=== Looking for upgrade documentation ==="
fd -e md . | xargs rg -l "upgrade|migration|breaking|changelog" || true

echo -e "\n=== Checking dashd-go main repository ==="
gh api repos/dashpay/dashd-go/contents/docs | jq -r '.[].name' || true
gh api repos/dashpay/dashd-go/contents/UPGRADING.md | jq -r '.content' | base64 -d || true

Length of output: 4572

proto/tendermint/abci/types.proto (2)

187-193: LGTM! Well-defined RequestFinalizeSnapshot message.

The message includes the essential light_block field needed for state reconstruction after snapshot sync.


908-908: LGTM! Service definition properly updated.

The FinalizeSnapshot RPC method is correctly added to the ABCIApplication service.

internal/statesync/syncer_test.go (4)

103-106: LGTM: Commit structure properly includes Height field.

The commit structure now correctly includes the Height field, which is essential for proper block validation.


108-115: LGTM: Well-structured LightBlock implementation.

The LightBlock structure properly encapsulates both the Commit and Header, following the standard blockchain block structure pattern.


131-132: LGTM: StateProvider mock updated to use LightBlock.

The mock has been correctly updated to return a LightBlock instead of just the commit, aligning with the new state sync protocol.


268-270: LGTM: FinalizeSnapshot mock properly configured.

The FinalizeSnapshot mock is correctly set up with proper expectations and return values.

spec/abci++/api.md Outdated Show resolved Hide resolved
@lklimek lklimek added this to the v1.5 milestone Jan 20, 2025
@lklimek lklimek marked this pull request as ready for review January 20, 2025 08:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
proto/tendermint/abci/types.proto (2)

186-196: Fix documentation formatting.

The documentation has some minor formatting issues:

  1. Missing period after "e.g"
  2. Missing blank line before heading
 // RequestFinalizeSnapshot is called by Tenderdash after successfully applying all snapshot chunks, eg.
+// RequestFinalizeSnapshot is called by Tenderdash after successfully applying all snapshot chunks, e.g.,
 // when the ABCI application returned `ResponseApplySnapshotChunk` with `Result = ACCEPT`.
 // It includes the light block committed at the synced height, which Tenderdash uses to reconstruct its own state.
 // The application should validate the light block against its restored state.
+
 // If the application fails to validate the light block, it should return error in the response.
 // This is considered fatal, non-recoverable consensus failure and will cause Tenderdash to restart.

Line range hint 947-956: Fix documentation formatting.

Add a blank line before the heading.

 <a name="tendermint-abci-ResponseFinalizeSnapshot"></a>
+
 ### ResponseFinalizeSnapshot
 The response to a `RequestPrepareSnapshot` message.
abci/example/kvstore/kvstore.go (2)

583-600: Consider enhancing the logging for better observability.

The debug logging could be more detailed to help with troubleshooting. Consider adding logging for the validation steps and error cases.

 func (app *Application) FinalizeSnapshot(_ctx context.Context, req *abci.RequestFinalizeSnapshot) (*abci.ResponseFinalizeSnapshot, error) {
     app.mu.Lock()
     defer app.mu.Unlock()
 
+    app.logger.Debug("Starting snapshot finalization",
+        "expected_height", app.LastCommittedState.GetHeight(),
+        "expected_app_hash", app.LastCommittedState.GetAppHash(),
+        "snapshot_height", req.LightBlock.SignedHeader.Header.Height,
+        "snapshot_app_hash", req.LightBlock.SignedHeader.Header.AppHash)
+
     // we verify snapshot height and app hash, as there is no additional logic to be called here
     if app.LastCommittedState.GetHeight() != req.LightBlock.SignedHeader.Header.Height {
+        app.logger.Error("Snapshot height mismatch",
+            "expected", app.LastCommittedState.GetHeight(),
+            "got", req.LightBlock.SignedHeader.Header.Height)
         return &abci.ResponseFinalizeSnapshot{}, fmt.Errorf("snapshot height mismatch: expected %d, got %d",
             app.LastCommittedState.GetHeight(), req.LightBlock.SignedHeader.Header.Height)
     }
 
     if !app.LastCommittedState.GetAppHash().Equal(req.LightBlock.SignedHeader.Header.AppHash) {
+        app.logger.Error("Snapshot app hash mismatch",
+            "expected", app.LastCommittedState.GetAppHash(),
+            "got", req.LightBlock.SignedHeader.Header.AppHash)
         return &abci.ResponseFinalizeSnapshot{}, fmt.Errorf("snapshot apphash mismatch: expected %x, got %x",
             app.LastCommittedState.GetAppHash(), req.LightBlock.SignedHeader.Header.AppHash)
     }
 
     app.logger.Debug("FinalizeSnapshot finished successfully", "req", req)
     return &abci.ResponseFinalizeSnapshot{}, nil
 }

583-600: Well-integrated state sync implementation.

The FinalizeSnapshot method is well-integrated into the state sync workflow, following the same patterns as other ABCI methods (mutex locking, error handling, logging). It complements the existing snapshot-related methods (OfferSnapshot, ApplySnapshotChunk, etc.) to provide a complete state synchronization solution.

The implementation demonstrates good practices:

  1. Consistent error handling across all snapshot-related methods
  2. Thread-safety through mutex usage
  3. Clear separation of concerns between different snapshot operations
  4. Proper validation of snapshot metadata before finalization
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e84e0be and af79697.

⛔ Files ignored due to path filters (1)
  • abci/types/types.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (4)
  • abci/example/kvstore/kvstore.go (1 hunks)
  • abci/types/application.go (2 hunks)
  • proto/tendermint/abci/types.proto (5 hunks)
  • spec/abci++/api.md (7 hunks)
🧰 Additional context used
🪛 LanguageTool
spec/abci++/api.md

[duplication] ~431-~431: Possible typo: you repeated a word.
Context: ...abci-RequestFinalizeSnapshot"> ### RequestFinalizeSnapshot RequestFinalizeSnapshot is called by Tenderdash after successfu...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~432-~432: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...cessfully applying all snapshot chunks, eg. when the ABCI application returned `Res...

(E_G)

🪛 Markdownlint (0.37.0)
spec/abci++/api.md

431-431: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)


443-443: null
Link fragments should be valid

(MD051, link-fragments)


949-949: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)

🔇 Additional comments (5)
abci/types/application.go (2)

19-20: LGTM! Interface addition is well-documented.

The FinalizeSnapshot method is properly added to the StateSyncer interface with clear documentation.


102-105: LGTM! Base implementation provides good default behavior.

The no-op implementation in BaseApplication follows the pattern used for other methods.

proto/tendermint/abci/types.proto (1)

Line range hint 252-252: LGTM! Protocol changes are well-structured.

The new finalize_snapshot field is properly added to Request/Response messages and the ABCIApplication service.

Also applies to: 836-836, 1402-1402

spec/abci++/api.md (1)

431-444: Documentation needs enhancement.

The FinalizeSnapshot documentation should be expanded to include:

  1. When exactly Tenderdash calls this method in the state sync process
  2. What the application should do with the light block
  3. Error handling expectations
  4. Impact on the node's state

Consider adding a detailed section similar to other ABCI methods:

### RequestFinalizeSnapshot

#### Usage
- Called by Tenderdash after successfully applying all snapshot chunks
- The light block allows the application to verify the final state
- Applications should validate the light block against their restored state
- Any errors during finalization should trigger a state sync retry

#### When does Tenderdash call it?
Detailed sequence of events...
🧰 Tools
🪛 LanguageTool

[duplication] ~431-~431: Possible typo: you repeated a word.
Context: ...abci-RequestFinalizeSnapshot"> ### RequestFinalizeSnapshot RequestFinalizeSnapshot is called by Tenderdash after successfu...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~432-~432: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...cessfully applying all snapshot chunks, eg. when the ABCI application returned `Res...

(E_G)

🪛 Markdownlint (0.37.0)

431-431: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)


443-443: null
Link fragments should be valid

(MD051, link-fragments)

abci/example/kvstore/kvstore.go (1)

583-600: LGTM! The implementation looks solid.

The FinalizeSnapshot method correctly validates both the height and AppHash of the snapshot against the last committed state, with proper error handling and thread safety.

A few notable strengths of this implementation:

  1. Thread-safe with mutex locking
  2. Clear, descriptive error messages
  3. Proper debug logging
  4. Consistent with ABCI protocol requirements

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
proto/tendermint/abci/types.proto (1)

186-196: Documentation needs enhancement.

The documentation should be expanded to include:

  1. When exactly Tenderdash calls this method in the state sync process
  2. What the application should do with the light block
  3. Error handling expectations
  4. Impact on the node's state

Consider adding a detailed section similar to other ABCI methods:

### RequestFinalizeSnapshot

#### Usage
- Called by Tenderdash after successfully applying all snapshot chunks
- The light block allows the application to verify the final state
- Applications should validate the light block against their restored state
- Any errors during finalization should trigger a state sync retry

#### When does Tenderdash call it?
Detailed sequence of events...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af79697 and a9cf16f.

⛔ Files ignored due to path filters (1)
  • abci/types/types.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (2)
  • proto/tendermint/abci/types.proto (5 hunks)
  • spec/abci++/api.md (7 hunks)
🧰 Additional context used
🪛 Markdownlint (0.37.0)
spec/abci++/api.md

431-431: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)


443-443: null
Link fragments should be valid

(MD051, link-fragments)


949-949: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)

⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: test_apps
  • GitHub Check: test_abci_cli
  • GitHub Check: tests (05)
  • GitHub Check: tests (04)
  • GitHub Check: tests (03)
  • GitHub Check: tests (02)
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: tests (01)
  • GitHub Check: Super linter
  • GitHub Check: tests (00)
  • GitHub Check: e2e-test (dashcore)
  • GitHub Check: golangci-lint
🔇 Additional comments (5)
proto/tendermint/abci/types.proto (3)

35-35: LGTM! Field addition follows protobuf best practices.

The finalize_snapshot field is properly added to the Request message with an appropriate field number.


529-529: LGTM! Field addition follows protobuf best practices.

The finalize_snapshot field is properly added to the Response message with an appropriate field number.


Line range hint 1402-1402: LGTM! Service definition is properly updated.

The FinalizeSnapshot RPC is correctly added to the ABCIApplication service.

spec/abci++/api.md (2)

429-449: Documentation needs enhancement.

The documentation feedback from the proto file review applies here as well.

🧰 Tools
🪛 Markdownlint (0.37.0)

431-431: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines

(MD022, blanks-around-headings)


443-443: null
Link fragments should be valid

(MD051, link-fragments)


21-21: LGTM! Table of contents is properly updated.

The table of contents correctly reflects the new FinalizeSnapshot messages.

Also applies to: 39-39

proto/tendermint/abci/types.proto Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
internal/statesync/syncer.go (4)

8-8: Remove debug-only import.

The os package is only used for debug operations. Consider removing it along with the associated debug code.


174-176: Validate retry counter logic.

The retry counter check should happen before attempting to get the best snapshot to avoid unnecessary processing.

-		if retries > 0 && snapshot == nil && iters > retries {
+		if retries > 0 && iters > retries {
 			return sm.State{}, nil, errNoSnapshots
 		}

375-377: Improve error message for invalid initial height.

The error message should be more descriptive about the expected valid range.

-		return sm.State{}, nil, fmt.Errorf("initial genesis height %d is invalid", state.InitialHeight)
+		return sm.State{}, nil, fmt.Errorf("initial genesis height must be greater than 0, got %d", state.InitialHeight)

608-635: Enhance error handling in finalizeSnapshot.

The method has good validation but could be improved:

  1. Add validation for snapshot parameter
  2. Enhance error wrapping for better debugging
 func (s *syncer) finalizeSnapshot(ctx context.Context, snapshot *snapshot, genesisBlock *types.LightBlock, snapshotBlock *types.LightBlock) error {
+	if snapshot == nil {
+		return fmt.Errorf("nil snapshot provided")
+	}
 	s.logger.Info("Finalizing snapshot restoration",
 		"snapshot", snapshot.Hash.String(),
 		"height", snapshot.Height,
@@ -629,7 +632,7 @@
 	_, err = s.conn.FinalizeSnapshot(ctx, &abci.RequestFinalizeSnapshot{
 		SnapshotBlock: snapshotBlockProto,
 		GenesisBlock:  genesisBlockProto,
-	})
+	}); if err != nil { return fmt.Errorf("ABCI FinalizeSnapshot call failed: %w", err) }
 
 	return err
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac6d385 and 2e89624.

📒 Files selected for processing (1)
  • internal/statesync/syncer.go (12 hunks)
🧰 Additional context used
📓 Learnings (1)
internal/statesync/syncer.go (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: internal/statesync/syncer.go:380-384
Timestamp: 2025-01-24T08:11:01.354Z
Learning: For genesis height conversion from int64 to uint64, using MustConvertUint64 is preferred over error handling since the height is already validated to be positive and cannot exceed uint64 range.
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: test_apps
  • GitHub Check: test_abci_cli
  • GitHub Check: build
  • GitHub Check: tests (02)
  • GitHub Check: tests (01)
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: golangci-lint
  • GitHub Check: Super linter
  • GitHub Check: e2e-test (dashcore)
🔇 Additional comments (1)
internal/statesync/syncer.go (1)

379-383: LGTM! Proper use of MustConvertUint64.

Based on the retrieved learning from @lklimek, using MustConvertUint64 is appropriate here since the height is already validated to be positive and cannot exceed uint64 range.

internal/statesync/syncer.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/statesync/chunks.go (1)

302-315: Add thread-safety documentation.

While the implementation is correct, consider adding documentation about thread-safety guarantees for better API clarity.

-// Pending returns a list of all chunks that have been requested but not yet received.
+// Pending returns a list of all chunks that have been requested but not yet received.
+// This method is thread-safe as it acquires the queue's mutex lock.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10fdbb2 and 3d39ee0.

📒 Files selected for processing (1)
  • internal/statesync/chunks.go (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: test_abci_cli
  • GitHub Check: test_apps
  • GitHub Check: build
  • GitHub Check: tests (03)
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: tests (01)
  • GitHub Check: e2e-test (dashcore)
  • GitHub Check: Super linter
🔇 Additional comments (3)
internal/statesync/chunks.go (3)

107-110: LGTM! Enhanced retry mechanism.

The change improves the retry mechanism by resetting chunk status to initStatus when re-enqueueing existing chunks, allowing for proper re-fetching.


138-146: LGTM! Improved nil safety handling.

The change properly handles nil chunk content by replacing it with an empty byte slice, maintaining nil safety while allowing empty chunks.


296-299: LGTM! Enhanced timeout error reporting.

The change improves error reporting by including the count of pending chunks in timeout errors, making it easier to diagnose synchronization issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/p2p/router.go (1)

360-362: Consider making the queue timeout configurable.

The 1-minute timeout for queue operations is hardcoded. Consider making this configurable through RouterOptions to allow flexibility based on different network conditions or requirements.

 type RouterOptions struct {
     // ... existing fields ...
+    // QueueTimeout is the timeout for queue operations. Defaults to 1 minute.
+    QueueTimeout time.Duration
 }

 func (o *RouterOptions) Validate() error {
     // ... existing validation ...
+    if o.QueueTimeout == 0 {
+        o.QueueTimeout = time.Minute
+    }
     return nil
 }

Then update the timeout creation:

-qCtx, qcancel := context.WithTimeout(ctx, time.Minute)
+qCtx, qcancel := context.WithTimeout(ctx, r.options.QueueTimeout)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18ab249 and 0abe165.

📒 Files selected for processing (1)
  • internal/p2p/router.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: test_apps
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: e2e-test (dashcore)
🔇 Additional comments (2)
internal/p2p/router.go (2)

363-370: LGTM! Enhanced logging for chunk messages.

The addition of debug logs for chunk messages will help with troubleshooting state sync issues, which aligns well with the PR's objectives.


376-378: LGTM! Proper timeout handling.

The timeout case in the select statement ensures graceful handling of queue timeouts with appropriate debug logging.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/statesync/chunks.go (2)

148-156: Simplify mutex management.

The manual mutex management with unlockFn is complex and potentially error-prone. Consider using defer q.mtx.Unlock() and restructuring the code to avoid the need for manual unlocking before channel operations.

-	// we need to manage lock manually to securely unlock before sending to applyCh
-	locked := true
-	unlockFn := func() {
-		if locked {
-			q.mtx.Unlock()
-			locked = false
-		}
-	}
-	defer unlockFn()
+	defer q.mtx.Unlock()

317-329: Optimize memory allocation in Pending method.

The method pre-allocates a slice with capacity equal to the total number of items, but it might only need a fraction of that size. Consider using a smaller initial capacity or a more memory-efficient approach.

-	waiting := make([]bytes.HexBytes, 0, len(q.items))
+	// Start with a smaller capacity as we expect most chunks to be received
+	waiting := make([]bytes.HexBytes, 0, len(q.items)/4)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0fc19f and 7398fa4.

📒 Files selected for processing (1)
  • internal/statesync/chunks.go (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: test_abci_cli
  • GitHub Check: test_apps
  • GitHub Check: build
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: Super linter
  • GitHub Check: e2e-test (dashcore)
🔇 Additional comments (2)
internal/statesync/chunks.go (2)

18-23: LGTM! Well-defined error variables.

The error variables are well-defined with clear, specific messages that help in error handling and debugging.


388-401: LGTM! Well-implemented helper method.

The getItem helper method is well-documented, has proper error handling, and maintains the mutex contract clearly in its documentation.

internal/statesync/chunks.go Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
internal/libs/sync/mutexguard.go (4)

1-5: Consider renaming the package to avoid conflict with standard library.

The package name sync might conflict with Go's standard library package of the same name. Consider using a more specific name like syncutil or mutexutil to avoid potential confusion.

-package sync
+package syncutil

7-20: LGTM! Consider more descriptive interface names.

The interfaces are well-defined and documented. They follow the interface segregation principle and match the standard library's mutex behaviors. Consider using more descriptive names like Locker and RLocker to better reflect their purpose, matching the standard library's sync.Locker interface.

-type Mtx interface {
+type Locker interface {
 	Lock()
 	Unlock()
 }

-type RMtx interface {
+type RLocker interface {
 	RLock()
 	RUnlock()
 }

22-37: Add nil check for defensive programming.

The implementation is thread-safe and effectively prevents double-unlock issues. However, it would be more robust with a nil check.

 func LockGuard(mtx Mtx) UnlockFn {
+	if mtx == nil {
+		panic("mutex is nil")
+	}
 	mtx.Lock()
 	locked := true

39-54: Consider code reuse and add nil check.

The implementation is correct but duplicates the pattern from LockGuard. Consider extracting the common pattern and adding a nil check.

Here's a suggested refactor to reduce duplication:

+// guardFn is a helper type for lock/unlock function pairs
+type guardFn struct {
+	lock   func()
+	unlock func()
+}
+
+func newGuard(fn guardFn) UnlockFn {
+	if fn.lock == nil || fn.unlock == nil {
+		panic("invalid guard functions")
+	}
+	fn.lock()
+	locked := true
+	return func() bool {
+		if locked {
+			locked = false
+			fn.unlock()
+			return true
+		}
+		return false
+	}
+}
+
 func LockGuard(mtx Mtx) UnlockFn {
+	if mtx == nil {
+		panic("mutex is nil")
+	}
-	mtx.Lock()
-	locked := true
-
-	return func() bool {
-		if locked {
-			locked = false
-			mtx.Unlock()
-			return true
-		}
-		return false
-	}
+	return newGuard(guardFn{
+		lock:   mtx.Lock,
+		unlock: mtx.Unlock,
+	})
 }

 func RLockGuard(mtx RMtx) UnlockFn {
+	if mtx == nil {
+		panic("mutex is nil")
+	}
-	mtx.RLock()
-	locked := true
-
-	return func() bool {
-		if locked {
-			locked = false
-			mtx.RUnlock()
-			return true
-		}
-		return false
-	}
+	return newGuard(guardFn{
+		lock:   mtx.RLock,
+		unlock: mtx.RUnlock,
+	})
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7398fa4 and e875f9a.

📒 Files selected for processing (8)
  • .mockery.yaml (1 hunks)
  • internal/consensus/state.go (1 hunks)
  • internal/libs/sync/mutexguard.go (1 hunks)
  • internal/statesync/chunks.go (4 hunks)
  • internal/statesync/chunks_test.go (4 hunks)
  • internal/statesync/mocks/consensusstateprovider.go (1 hunks)
  • internal/statesync/reactor.go (19 hunks)
  • internal/statesync/reactor_test.go (16 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/statesync/chunks_test.go
  • internal/statesync/chunks.go
  • internal/statesync/reactor.go
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: test_apps
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: tests (00)
  • GitHub Check: e2e-test (dashcore)
🔇 Additional comments (6)
internal/statesync/mocks/consensusstateprovider.go (1)

1-128: LGTM! The mock implementation is well-structured.

The auto-generated mock follows best practices:

  • Proper error handling for unspecified return values
  • Support for both direct value returns and function callbacks
  • Clean initialization with automatic expectation verification
internal/consensus/state.go (1)

310-312: LGTM! The GetCurrentHeight implementation is clean and thread-safe.

The method correctly retrieves the current height from the thread-safe stateDataStore.

internal/statesync/reactor_test.go (3)

85-92: LGTM! The setup function signature update is appropriate.

The addition of csState parameter allows for proper testing of the ConsensusStateProvider functionality.


241-241: LGTM! The FinalizeSnapshot mock expectation is well-defined.

The mock expectation correctly verifies that FinalizeSnapshot is called exactly once with the right parameters.


772-786: LGTM! The retryUntil helper function improves test reliability.

The function properly handles retries with timeout and context cancellation.

.mockery.yaml (1)

56-56: LGTM! The mockery configuration update is correct.

ConsensusStateProvider is properly added to the statesync package interfaces.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/statesync/syncer.go (2)

175-178: Consider improving retry condition.

The retry condition could be more clearly expressed.

-if retries > 0 && snapshot == nil && iters > retries {
+if retries > 0 && snapshot == nil && iters >= retries {

551-553: Consider improving error handling.

The error handling could be more robust here.

-s.logger.Error("failed to request snapshot chunk", "err", err, "chunkID", ID)
-// retry the chunk
-s.chunkQueue.Enqueue(ID)
+s.logger.Error("failed to request snapshot chunk, will retry", 
+    "err", err, 
+    "chunkID", ID,
+    "retry_timeout", s.retryTimeout)
+s.chunkQueue.Enqueue(ID)
+
+// Allow some time before retry to avoid overwhelming the network
+time.Sleep(100 * time.Millisecond)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e875f9a and b39e9c2.

📒 Files selected for processing (1)
  • internal/statesync/syncer.go (12 hunks)
🧰 Additional context used
📓 Learnings (1)
internal/statesync/syncer.go (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: internal/statesync/syncer.go:380-384
Timestamp: 2025-01-24T08:11:01.354Z
Learning: For genesis height conversion from int64 to uint64, using MustConvertUint64 is preferred over error handling since the height is already validated to be positive and cannot exceed uint64 range.
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: test_abci_cli
  • GitHub Check: test_apps
  • GitHub Check: build
  • GitHub Check: tests (05)
  • GitHub Check: tests (04)
  • GitHub Check: tests (02)
  • GitHub Check: tests (01)
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: tests (00)
  • GitHub Check: e2e-test (dashcore)
🔇 Additional comments (6)
internal/statesync/syncer.go (6)

33-34: LGTM: Timeout constant added for chunk requests.

The new constant chunkRequestSendTimeout with a value of 5 seconds is appropriate for network operations.


377-379: LGTM: Initial height validation added.

Good addition of validation to ensure initial height is valid.


381-385: LGTM: Genesis block retrieval.

Correctly using MustConvertUint64 for height conversion as per previous learnings.


605-632: LGTM: Well-implemented snapshot finalization.

The new finalizeSnapshot method:

  • Has proper null checks
  • Includes informative logging
  • Handles proto conversion errors
  • Uses clear error messages

530-537: LGTM: Improved debug logging.

Enhanced logging in the fetchChunks method improves observability.


599-602: LGTM: Added timeout for chunk requests.

Good addition of context timeout for chunk requests using the new constant.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/statesync/chunks.go (1)

405-418: Enhance error handling documentation and messages.

Consider improving the error handling:

  1. Document possible error types (errNilSnapshot and custom errors) in the method comments.
  2. Make the error message more specific about why the chunk wasn't found (e.g., "chunk not in queue" vs "chunk not found").
 // getItem fetches chunk item from the items map. If the item is not found, it returns an error.
+// Returns:
+// - errNilSnapshot if the queue is closed
+// - error if the chunk is not in the queue
 // The caller must hold the mutex lock.
 func (q *chunkQueue) getItem(chunkID bytes.HexBytes) (*chunkItem, error) {
     if q.snapshot == nil {
         return nil, errNilSnapshot
     }
     chunkIDKey := chunkID.String()
     item, ok := q.items[chunkIDKey]
     if !ok {
-        return nil, fmt.Errorf("chunk %x not found", chunkID)
+        return nil, fmt.Errorf("chunk %x not in queue", chunkID)
     }

     return item, nil
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b39e9c2 and c18052b.

📒 Files selected for processing (1)
  • internal/statesync/chunks.go (7 hunks)
🧰 Additional context used
📓 Learnings (1)
internal/statesync/chunks.go (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: internal/statesync/chunks.go:304-306
Timestamp: 2025-02-05T10:12:23.371Z
Learning: In the state sync chunk queue's timeout path, the locking in `q.Pending()` is sufficient as it's the only lock in this execution path, making the error reporting thread-safe.
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: test_apps
  • GitHub Check: test_abci_cli
  • GitHub Check: build
  • GitHub Check: tests (05)
  • GitHub Check: tests (04)
  • GitHub Check: tests (02)
  • GitHub Check: tests (01)
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: e2e-test (dashcore)
  • GitHub Check: golangci-lint
  • GitHub Check: Super linter
🔇 Additional comments (3)
internal/statesync/chunks.go (3)

154-206: LGTM! Well-structured changes to improve robustness.

The changes improve the method by:

  • Adding better error handling with detailed messages
  • Using proper mutex handling to avoid deadlocks
  • Supporting empty chunks
  • Unlocking before channel operations to prevent deadlocks

304-330: LGTM! Thread-safe timeout handling with improved error reporting.

The changes improve error reporting by including the count of pending chunks in the timeout error message. The locking is correctly handled through q.Pending().


332-345: LGTM! Efficient implementation with proper locking.

The method is well-implemented with:

  • Proper mutex handling
  • Efficient slice pre-allocation
  • Clear status checks for pending chunks

internal/statesync/chunks.go Outdated Show resolved Hide resolved
@lklimek
Copy link
Collaborator Author

lklimek commented Feb 5, 2025

@coderabbitai review all

Copy link
Contributor

coderabbitai bot commented Feb 5, 2025

@lklimek: I'll perform a full review of all changes in the PR.

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (26)
libs/math/safemath.go (1)

262-289: Dynamic max function.
Max[T Integer]() provides a wide coverage across all integer types. The panic in the default case clearly signals unsupported types, which is good.

Consider adding a short doc comment that clarifies the function’s usage and mentions the panic for unsupported types.

internal/statesync/syncer.go (4)

33-33: Consider Making the Timeout Configurable
Defining chunkRequestSendTimeout as a constant is acceptable, but allowing it to be overridden via configuration can make it more flexible for different environments.

-const chunkRequestSendTimeout = 5 * time.Second
+// Optionally make this configurable in the config
+const chunkRequestSendTimeout = 5 * time.Second

176-178: Validate Boundary Condition
When retries > 0 && snapshot == nil && iters > retries, the code returns errNoSnapshots. Confirm that iters > retries (rather than >=) is the intended boundary check.


599-603: Per-Send Context Timeout
Wrapping s.chunkCh.Send with a context.WithTimeout helps prevent indefinite blocking when the channel is congested.


605-632: Suggest Additional Validations
finalizeSnapshot correctly checks for nil blocks and calls FinalizeSnapshot on the ABCI app. For completeness, consider verifying that the snapshotBlock height matches snapshot.Height and that genesisBlock is indeed the correct initial block for the chain.

internal/p2p/router.go (1)

360-365: Per-Message Enqueue Timeout
Wrapping the enqueue operation in context.WithTimeout(ctx, time.Minute) prevents the router from indefinitely blocking. Consider making this duration configurable.

internal/statesync/reactor.go (5)

72-73: Consider making minPeers configurable.

Currently hard-coded to 2, the TODO hints this may need to adapt to different network scenarios. Configurability could help operators tune the state sync process.


141-148: Expand documentation on the new interface.

ConsensusStateProvider looks straightforward. Including docstrings on possible side effects and expected error scenarios can ensure clarity for implementers.


170-170: Re-check parameter positioning.

Adding csState at the end is logical for optional parameters. If this is mandatory, consider reordering so that optional arguments remain last.


251-251: Evaluate log level for initialization message.

Currently using Info for an internal initialization log. Consider downgrading to Debug in high-traffic or production systems to reduce log noise.


337-337: Repeated peer checks.

Calling waitForEnoughPeers again ensures consistency but might be refactored for better maintenance if repeated often.

types/validator_test.go (2)

119-119: Fix typo in test function name.

The test function name has a typo: "TestValdiatorSetHash" should be "TestValidatorSetHash".

-func TestValdiatorSetHash(t *testing.T) {
+func TestValidatorSetHash(t *testing.T) {

128-128: Remove commented out code.

The commented out require.NoError(t, err) line appears to be unnecessary.

-    // require.NoError(t, err)
libs/math/safemath_test.go (1)

88-191: Consider using table-driven subtests for better test organization.

While the test cases are comprehensive, using t.Run() with table-driven subtests would improve test output and failure isolation.

Here's how you could refactor the test:

 func TestSafeConvert(t *testing.T) {
 	testCases := []struct {
+		name string
 		from interface{}
 		want interface{}
 		err  bool
 	}{
-		{int(0), int64(0), false},
+		{
+			name: "int zero to int64",
+			from: int(0),
+			want: int64(0),
+			err:  false,
+		},
 		// ... other test cases ...
 	}

 	for i, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
 			var result interface{}
 			var err error
 			// ... rest of the test logic ...
+		})
 	}
 }
abci/types/application.go (1)

102-105: Consider enhancing the documentation comment.

The current comment could be more descriptive about when this method is called and what it means for the application.

-// FinalizeSnapshot provides a no-op implementation for the StateSyncer interface
+// FinalizeSnapshot is called after successful state sync completion to provide the application
+// with the light block at the sync height. This allows the application to verify the state
+// matches the expected hash and perform any necessary finalization steps. The default
+// implementation is a no-op.
 func (BaseApplication) FinalizeSnapshot(_ context.Context, _req *RequestFinalizeSnapshot) (*ResponseFinalizeSnapshot, error) {
config/toml.go (1)

501-505: LGTM! Consider enhancing the retry documentation.

The removal of trust-based parameters and addition of retry mechanism looks good. Consider adding a note about the impact of retries on state sync duration.

 # Number of times to retry state sync. When retries are exhausted, the node will
 # fall back to the regular block sync. Set to 0 to disable retries. Default is 3.
 # Note that in pessimistic case, it will take at least (discovery-time * retries) before
-# falling back to block sync.
+# falling back to block sync. For example, with discovery-time=30s and retries=3,
+# it may take up to 90 seconds before falling back to block sync.
 retries = {{ .StateSync.Retries }}
abci/example/kvstore/kvstore.go (1)

583-606: LGTM! Consider adding metrics for snapshot finalization.

The implementation correctly verifies the snapshot state before finalization. Consider adding metrics to track successful/failed snapshot finalizations.

 func (app *Application) FinalizeSnapshot(_ctx context.Context, req *abci.RequestFinalizeSnapshot) (*abci.ResponseFinalizeSnapshot, error) {
 	app.mu.Lock()
 	defer app.mu.Unlock()
+
+	// Track snapshot finalization metrics
+	defer func(start time.Time) {
+		app.metrics.StateSync.FinalizeSnapshotTime.Observe(time.Since(start).Seconds())
+	}(time.Now())

 	// we only run some verifications here

 	if app.LastCommittedState.GetHeight() != req.SnapshotBlock.SignedHeader.Header.Height {
internal/statesync/reactor_test.go (1)

444-446: Consider adding test cases for consensus state provider.

The consensus state provider is added but only used for height validation. Consider adding test cases for error scenarios.

// Example test case:
func TestReactor_Sync_ConsensusStateError(t *testing.T) {
    cp := mocks.NewConsensusStateProvider(t)
    cp.On("GetCurrentHeight").Return(int64(0)).Once()
    rts := setup(ctx, t, nil, nil, cp, 100)
    // Test error handling...
}
internal/mempool/reactor.go (1)

97-97: Adjusted Logging Level for Peer Updates
Switching from a higher log level to Trace for logging peer updates provides finer-grained output. Ensure that your deployment’s logging configuration filters trace logs in production to avoid log flooding.

internal/blocksync/reactor.go (1)

162-162: Changed Log Level to Trace for Peer Updates
The change to use Trace instead of a more prominent log level (e.g., Debug) for recording peer update events enhances traceability while keeping production logs cleaner. Confirm that trace-level logging is enabled only when detailed diagnostics are needed.

internal/evidence/reactor.go (1)

204-204: Logging Level Adjusted for Peer Updates
Updating the log statement to Trace for peer updates aligns with the overall logging refinement across the codebase. Verify that this additional verbosity is controlled by the log configuration in production environments.

proto/tendermint/abci/types.proto (1)

913-914: New FinalizeSnapshot RPC Method:
The service definition now includes the FinalizeSnapshot RPC. This addition correctly routes FinalizeSnapshot requests to the ABCI application. Please ensure that its usage is thoroughly covered by unit and end-to-end tests.

spec/abci++/api.md (4)

430-447: API Documentation – RequestFinalizeSnapshot Section:
The new section documents when and how RequestFinalizeSnapshot is used—including the fields snapshot_block and genesis_block—and explains that the application should validate the restored state. It might be beneficial to add an explicit note that any validation failure should result in an error returned directly (thus causing a restart), to reinforce the operational expectations.

🧰 Tools
🪛 LanguageTool

[duplication] ~431-~431: Possible typo: you repeated a word.
Context: ...abci-RequestFinalizeSnapshot"> ### RequestFinalizeSnapshot RequestFinalizeSnapshot is called by Tenderdash after successfu...

(ENGLISH_WORD_REPEAT_RULE)

🪛 markdownlint-cli2 (0.17.2)

431-431: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


443-443: Link fragments should be valid
null

(MD051, link-fragments)


444-444: Link fragments should be valid
null

(MD051, link-fragments)


949-956: API Documentation – ResponseFinalizeSnapshot Section:
The documentation for ResponseFinalizeSnapshot is concise, reflecting its empty structure as required. For clarity, consider adding a brief note that any snapshot finalization error must be communicated as an RPC error rather than through this empty message.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

950-950: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


431-431: Documentation Formatting:
Static analysis highlights that headings should be surrounded by blank lines (e.g. before “### RequestFinalizeSnapshot”). Please ensure to update the markdown formatting accordingly.

🧰 Tools
🪛 LanguageTool

[duplication] ~431-~431: Possible typo: you repeated a word.
Context: ...abci-RequestFinalizeSnapshot"> ### RequestFinalizeSnapshot RequestFinalizeSnapshot is called by Tenderdash after successfu...

(ENGLISH_WORD_REPEAT_RULE)

🪛 markdownlint-cli2 (0.17.2)

431-431: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


950-950: Documentation Formatting:
Similarly, ensure that the “### ResponseFinalizeSnapshot” heading is surrounded by blank lines to meet markdownlint guidelines.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

950-950: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa4978a and ae0b843.

⛔ Files ignored due to path filters (2)
  • abci/types/types.pb.go is excluded by !**/*.pb.go
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (43)
  • .mockery.yaml (1 hunks)
  • abci/client/grpc_client.go (1 hunks)
  • abci/client/mocks/client.go (1 hunks)
  • abci/client/routed_client.go (1 hunks)
  • abci/client/socket_client.go (2 hunks)
  • abci/example/kvstore/kvstore.go (1 hunks)
  • abci/types/application.go (2 hunks)
  • abci/types/messages.go (1 hunks)
  • abci/types/mocks/application.go (1 hunks)
  • config/config.go (3 hunks)
  • config/toml.go (1 hunks)
  • go.mod (3 hunks)
  • internal/blocksync/reactor.go (1 hunks)
  • internal/consensus/state.go (1 hunks)
  • internal/consensus/state_data.go (1 hunks)
  • internal/evidence/reactor.go (1 hunks)
  • internal/libs/sync/mutexguard.go (1 hunks)
  • internal/mempool/reactor.go (1 hunks)
  • internal/p2p/router.go (1 hunks)
  • internal/p2p/transport_mconn.go (1 hunks)
  • internal/proxy/client.go (1 hunks)
  • internal/statesync/chunks.go (7 hunks)
  • internal/statesync/chunks_test.go (4 hunks)
  • internal/statesync/mocks/consensusstateprovider.go (1 hunks)
  • internal/statesync/mocks/stateprovider.go (1 hunks)
  • internal/statesync/peer.go (2 hunks)
  • internal/statesync/reactor.go (19 hunks)
  • internal/statesync/reactor_test.go (16 hunks)
  • internal/statesync/stateprovider.go (5 hunks)
  • internal/statesync/syncer.go (12 hunks)
  • internal/statesync/syncer_test.go (12 hunks)
  • libs/math/safemath.go (3 hunks)
  • libs/math/safemath_test.go (1 hunks)
  • proto/tendermint/abci/types.proto (5 hunks)
  • spec/abci++/api.md (7 hunks)
  • test/e2e/networks/rotate.toml (2 hunks)
  • test/e2e/pkg/mockcoreserver/core_server.go (1 hunks)
  • test/e2e/pkg/mockcoreserver/methods.go (1 hunks)
  • test/e2e/runner/setup.go (0 hunks)
  • test/e2e/runner/start.go (1 hunks)
  • types/quorum_sign_data.go (1 hunks)
  • types/validator_test.go (2 hunks)
  • version/version.go (1 hunks)
💤 Files with no reviewable changes (1)
  • test/e2e/runner/setup.go
🧰 Additional context used
📓 Learnings (5)
libs/math/safemath_test.go (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: libs/math/safemath.go:123-161
Timestamp: 2025-01-20T13:38:10.707Z
Learning: The `SafeConvert` function in `libs/math/safemath.go` correctly handles negative values during integer type conversion through two mechanisms: explicit checks for int64 to uint64 conversion, and the Min[U]() check that catches negative values when converting to unsigned types.
proto/tendermint/abci/types.proto (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: proto/tendermint/abci/types.proto:645-648
Timestamp: 2025-01-20T11:38:49.099Z
Learning: In ABCI protocol, FinalizeSnapshot call should return error directly for validation failures rather than using error fields in ResponseFinalizeSnapshot message, as these failures are considered fatal consensus failures that should cause Tenderdash to restart.
internal/statesync/chunks.go (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: internal/statesync/chunks.go:304-306
Timestamp: 2025-02-05T10:12:23.371Z
Learning: In the state sync chunk queue's timeout path, the locking in `q.Pending()` is sufficient as it's the only lock in this execution path, making the error reporting thread-safe.
libs/math/safemath.go (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: libs/math/safemath.go:123-161
Timestamp: 2025-01-20T13:38:10.707Z
Learning: The `SafeConvert` function in `libs/math/safemath.go` correctly handles negative values during integer type conversion through two mechanisms: explicit checks for int64 to uint64 conversion, and the Min[U]() check that catches negative values when converting to unsigned types.
internal/statesync/syncer.go (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: internal/statesync/syncer.go:380-384
Timestamp: 2025-01-24T08:11:01.354Z
Learning: For genesis height conversion from int64 to uint64, using MustConvertUint64 is preferred over error handling since the height is already validated to be positive and cannot exceed uint64 range.
🪛 LanguageTool
spec/abci++/api.md

[duplication] ~431-~431: Possible typo: you repeated a word.
Context: ...abci-RequestFinalizeSnapshot"> ### RequestFinalizeSnapshot RequestFinalizeSnapshot is called by Tenderdash after successfu...

(ENGLISH_WORD_REPEAT_RULE)

🪛 markdownlint-cli2 (0.17.2)
spec/abci++/api.md

431-431: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


443-443: Link fragments should be valid
null

(MD051, link-fragments)


444-444: Link fragments should be valid
null

(MD051, link-fragments)


950-950: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🔇 Additional comments (124)
internal/statesync/chunks_test.go (4)

4-4: LGTM! Good improvement in error handling.

Adding the errors package enables the use of errors.Is() for more robust error type checking.


104-107: LGTM! Enhanced test failure messages.

Including the test case index in error messages improves debugging by making it clear which test case failed.


248-249: LGTM! Improved error handling practices.

The changes from direct error comparison to errors.Is() and require.ErrorIs() follow Go's best practices for error handling, properly handling wrapped errors.

Also applies to: 288-288


1-368: Verify test coverage for state sync functionality.

While this file has good test coverage for chunk queue functionality, please ensure that the new FinalizeSnapshot functionality mentioned in the PR objectives is covered in other test files.

Let's check for test coverage of the new functionality:

libs/math/safemath.go (16)

11-11: New error constant is fine.
Declaring ErrOverflowUint64 helps ensure more precise error reporting for operations involving uint64 overflows.


15-15: Broad overflow error is acceptable.
ErrOverflow serves as a general catch-all and can be reused in multiple scenarios, providing good flexibility.


99-102: Delegate safely to the generic converter.
Using SafeConvert[T, uint64] within SafeConvertUint64() centralizes logic in one place, reducing duplication.


104-107: Consistent approach.
SafeConvertInt64() likewise leverages the generic helper, maintaining a single overflow-handling pathway.


109-112: Concise usage.
Similarly, SafeConvertInt16() relies on SafeConvert, which is a clean and DRY practice.


114-117: No issues found.
SafeConvertUint16() similarly benefits from the generic approach for overflow checks.


123-161: Robust generic conversion logic.
SafeConvert[T, U] provides a well-encapsulated overflow-check mechanism for various integer types. The special cases for 64-bit conversions and fallback checks with Max[U]() and Min[U]() are clearly outlined. Be aware of the small performance overhead from runtime type checks, but otherwise this is solid.


163-170: Convenient must-convert method.
MustConvert is useful where callers want to avoid repetitive error handling. Panicking is acceptable here, as the name signals the behavior clearly.


172-174: Helper to unify usage.
MustConvertUint64 is a straightforward alias to reduce repetitive type parameters. No concerns here.


176-178: Same pattern, no issues.
MustConvertInt64 also aligns with the same approach of reusing the generic method.


180-182: Completes the coverage.
MustConvertUint16 adds completeness for smaller integer conversions.


184-186: Likewise consistent.
MustConvertInt16 continues the pattern without introducing new risks.


188-190: No concerns raised.
MustConvertUint8 is consistent with the rest of the must-convert suite.


192-194: Uniform approach.
MustConvertUint similarly extends the coverage to uint.


196-198: Completes integer conversions.
MustConvertInt follows the same safe pattern and is well-named.


291-309: Dynamic min function with sound default for unsigned.
Min[T Integer]() addresses the entire integer range, returning 0 for unsigned types. Good job indicating the fallback with a panic for unsupported types.

internal/statesync/chunks.go (17)

4-5: Appropriate imports.
Imports for sha256 and hex are used for generating a unique chunk filename. No issues here.


14-14: Namespace alias is fine.
Import alias tmsync from internal/libs/sync to avoid collisions is acceptable for clarity.


70-84: Filename generation logic is sound.
Hashing chunkID to generate a deterministic file name ensures uniqueness and avoids collision among chunks. The doc comments are clear about concurrency.


156-157: Guard against nil chunk references.
Checking for chunk == nil with an immediate error prevents panics or undefined behavior.


160-165: Ensure not passing a nil byte slice.
This snippet enforces []byte{} instead of nil, which helps avoid write errors later.


166-167: Lock guard usage is good.
Calling LockGuard(&q.mtx) is a neat encapsulation that automatically unlocks when the function returns. Helps reduce risk of forgetting unlock calls.


169-172: Graceful error handling.
q.getItem(chunk.ID) surfaces an explicit error message if the chunk item is missing.


179-182: Chunk validation performed prior to writing.
Ensures the chunk belongs to the correct snapshot and height. Good step to reduce corruption.


184-189: Obtain filename safely.
item.Filename(q.dir) sets up the chunk’s file path. Logging the error with chunk ID is helpful for debugging.


190-190: File writing is done next.
item.write(data) is executed after ensuring the path is valid, which is the logical sequence.


197-199: Unlocking before applying.
Releasing the mutex prior to sending on applyCh avoids potential deadlocks if applyCh is blocked. This is a sensible concurrency practice.


202-205: Signal waiters.
Locking again to call item.closeWaitChs(true) ensures threadsafe channel closure. This is handled carefully.


327-330: Timeout error with count of pending chunks.
Providing len(q.Pending()) in the error clarifies how many chunks remain. This is good for debugging.


333-346: Pending chunk retrieval.
The Pending() method aggregates chunk IDs that remain unreceived. Straightforward approach, no concurrency concerns.


356-363: Reusable helper for chunk re-queuing.
retry() resets status to initStatus and re-appends the chunk to requestQueue, which is consistent with how chunks are originally enqueued.


406-419: Get item validation.
getItem() ensures the queue is open (snapshot != nil) and the item is present. Returns a descriptive error if not found, which is a well-structured approach.


421-423: Validation method reference.
validateChunk() is a good single place to confirm chunk alignment with the snapshot. No concurrency or logic issues spotted.

internal/statesync/stateprovider.go (5)

43-44: New LightBlock method in the interface.
Exposing LightBlock retrieval broadens the interface's utility and aligns well with other block-specific methods.


126-131: RPC-based LightBlock retrieval.
LightBlock() in stateProviderRPC leverages the existing verifyLightBlockAtHeight logic. This is consistent with the other methods and ensures validated block data.


223-224: Minimum peers check.
Ensuring at least minPeers providers in NewP2PStateProvider helps maintain a safe threshold for light client security.


227-228: Light client creation for P2P.
Using light.NewClient(..., dashCoreClient, ...) parallels the RPC version. Properly accepts multiple providers for redundancy.


271-276: LightBlock in P2P.
Similar lock usage and verification approach as stateProviderRPC. The consistency helps ensure uniform data retrieval no matter the strategy.

internal/statesync/syncer.go (19)

19-19: No Issues Found
Importing tmmath looks fine and is consistent with usage elsewhere.


96-96: Good Log Key Renaming
Using "chunkID", chunk.ID is more consistent with other log fields.


127-128: Reduced Logging Noise
Switching from Info to Debug for snapshots not added helps avoid clutter in production logs.


156-156: New retries Parameter
Introducing a retries int parameter for SyncAny increases control over how many times the procedure will attempt to find snapshots.


190-193: Re-request Snapshots
Re-requesting snapshots upon discovery failure helps ensure a robust sync process. Adding logs or metrics may further assist in debugging repeated failures.


220-220: Potential Overflow Check
MustConvertInt64(snapshot.Height) can panic if snapshot.Height exceeds math.MaxInt64. Verify that snapshot heights are guaranteed to remain within int64 range.


347-347: Improved Error Checking
Using errors.Is to detect light.ErrNoWitnesses is more robust and recommended over direct comparisons.


355-356: Transition to Light Block
Retrieving a light block instead of a commit aligns with the new approach for final verification.


364-364: Clearer Error Message
Returning a more descriptive error regarding missing witnesses clarifies the failure cause.


366-366: Straightforward Logging
Logging the failure to verify the light block before discarding the snapshot is consistent with the approach used for other errors.


377-379: Sanity Check on InitialHeight
Ensuring InitialHeight >= 1 prevents invalid genesis states and fails fast with a clear error.


381-385: Consistent with Learnings on MustConvertUint64
Converting state.InitialHeight to uint64 here is valid, since we have a prior check ensuring it is positive.


387-390: Validate Snapshot Finalization
Immediately returning an error if finalizeSnapshot fails prevents partially finalized states.


403-403: Return the LightBlock Commit
Revising the return type to use the block’s commit is consistent with the shift towards light blocks.


530-536: Helpful Debug Messages
Logging when the fetch chunk queue is empty or times out provides clear traceability when diagnosing synchronization stalls.


542-542: Added Error Context in Logs
Including "err", err in the debug statement clarifies why the queue is waiting again.


548-548: Consistent Log Key
Using "chunkID", ID matches the naming convention seen in other logging statements.


551-553: Retry Mechanism on Failure
Re-enqueuing chunks instead of immediately aborting ensures resilience if a chunk request fails at first attempt.


560-561: Chunk Timeout Retry
Logging a timeout for chunk reception and retrying is a good approach for robust chunk acquisition.

internal/statesync/syncer_test.go (7)

86-86: Explicit InitialHeight in Tests
Setting InitialHeight: 1 in the test ensures state init logic is properly checked.


103-115: LightBlock Construction
Defining both a Commit and a SignedHeader in the test suite simulates real block data, enabling more realistic test coverage.


131-132: Mocking LightBlock Retrieval
Stubbing out On("LightBlock", ...) ensures sync logic that depends on the final block data is properly verified.


268-269: Testing FinalizeSnapshot
Adding a mock expectation for FinalizeSnapshot validates the snapshot finalization flow and ensures the method is invoked as intended.


280-280: Zero-Retries Case
Calling SyncAny with (ctx, 0, 0, ...) tests the flow without extra attempts, ensuring coverage of the one-shot scenario.


297-298: Robust Error Path Testing
These lines confirm correct error behavior (errNoSnapshots, errAbort, or custom errors) under different snapshot rejection scenarios. This broadens test coverage for edge conditions.

Also applies to: 323-324, 373-374, 416-417, 470-471, 498-499


627-628: Chunk Queue Initialization
Explicitly creating and enqueuing chunks checks that snapshot chunk logic is triggered within tests.

internal/p2p/router.go (2)

368-370: Conditional Logging for ChunkChannel
Debug-logging sent chunk messages increases transparency in the gossip layer while avoiding verbose logs for other channels.


376-378: Timeout Handling
Dropping a message after the enqueue context times out is a practical safeguard against resource exhaustion.

internal/statesync/reactor.go (12)

24-24: Approved import usage.

The new import for tmmath is well-placed, providing crucial numeric-safe conversions for subsequent logic.


138-139: Validate concurrency handling on csState.

Introducing csState can be helpful but be cautious if it's accessed by multiple goroutines. Confirm calls are protected by locks to avoid race conditions.


254-254: Peer availability pre-check.

Verifying peer count before proceeding helps ensure state sync is viable. Good defensive approach.


264-265: P2P state provider initialization.

Removal of trust parameters aligns with the updated approach. The method signature is coherent.


273-273: RPC state provider update.

Similarly removing trust-based parameters. Consistent with the revised design.


292-306: Fallback logic when no snapshots.

Gracefully falling back to block sync is a practical solution if no snapshots are found. Verify that all edge cases (e.g., corrupted snapshots) are covered by tests.


355-355: Confirm SyncAny test coverage.

Linking SyncAny to configuration values requires thorough testing (especially varied DiscoveryTime and Retries) to ensure robust error-handling.


722-722: Explicit handling of unknown message types.

Returning errors for unrecognized channel messages is essential to prevent silent failures or unexpected behaviors.

Also applies to: 803-803, 864-864, 912-912


997-997: Trace-level logging for peer updates.

Using Trace helps keep the logs less cluttered in production. Consider switching to Debug if these events become critical to track.

Also applies to: 1006-1006


1064-1069: Guard against nil consensus state.

If csState is absent, returning an empty snapshot list avoids potential nil pointer errors. This is a solid defensive check.


1095-1101: Filtering snapshots close to current height.

Skipping snapshots that are too recent ensures that a finalized commit is available. Correct approach for stable snapshot data.


1117-1117: Beware of possible overflow in MustConvertInt64.

MustConvertInt64 will panic on extreme inputs. Consider a more graceful failure if heights exceed int64 limits.

version/version.go (1)

14-14: New ABCI semantic version.

Updating ABCISemVer to 1.3.0 reflects added functionalities. Ensure this is consistently documented throughout release notes.

internal/libs/sync/mutexguard.go (5)

1-5: Definition of UnlockFn.

Returning a boolean status is a helpful mechanism to detect redundant unlock attempts.


7-12: Minimal Mtx interface.

This design allows interchangeability with standard or custom mutex implementations.


14-20: RMtx interface for read locks.

Similarly provides flexibility for read locks and potential custom read-write lock implementations.


22-37: LockGuard function.

Encapsulating lock/unlock behavior in this higher-level construct simplifies usage and promotes safer concurrency patterns.


39-54: RLockGuard function.

Mirrors the read-lock side effectively. Both guard functions improve usability and reduce the risk of forgetting unlock calls.

test/e2e/runner/start.go (1)

91-91: LGTM! Code simplification improves readability.

The changes effectively simplify the waitForHeight logic by removing unnecessary intermediate variables while maintaining the same functionality.

test/e2e/pkg/mockcoreserver/methods.go (1)

68-68: LGTM! Proper handling of QuorumHash parameter.

The change correctly includes QuorumHash in the unmarshalCmd parameters, ensuring proper verification functionality.

internal/statesync/mocks/consensusstateprovider.go (1)

1-127: LGTM! Well-structured auto-generated mock.

The auto-generated mock follows standard patterns and includes proper error handling, type assertions, and documentation.

internal/statesync/peer.go (1)

99-99: LGTM! Appropriate logging level adjustment.

The change from Info to Trace level is appropriate for these peer update logs, as they are frequent operational events that don't need high visibility.

Also applies to: 109-109

abci/types/messages.go (1)

90-94: LGTM! Consistent implementation.

The new ToRequestFinalizeSnapshot function follows the established pattern for request conversion functions and integrates well with the state sync feature.

abci/types/application.go (1)

19-20: LGTM! Well-integrated interface addition.

The new FinalizeSnapshot method is a logical addition to the StateSyncer interface for completing the state sync process.

internal/statesync/mocks/stateprovider.go (2)

148-176: Implementation follows mock patterns correctly.

The LightBlock method implementation follows the established mock patterns with proper error handling and type checking.


178-205: Helper methods are properly implemented.

The mock helper methods (Run, Return, RunAndReturn) are correctly implemented following the established patterns.

types/quorum_sign_data.go (1)

164-166: Verify the QuorumHash size constant change.

The validation now uses crypto.QuorumHashSize instead of crypto.DefaultHashSize. This change could affect existing code that relies on the previous validation.

Run the following script to check for any other occurrences of QuorumHash validation:

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Search for QuorumHash size validations
rg -A 2 "QuorumHash.*len|len.*QuorumHash"

# Search for DefaultHashSize usage
rg -A 2 "DefaultHashSize"

Length of output: 19354


QuorumHash validation change verified.

The updated validation in types/quorum_sign_data.go now correctly uses crypto.QuorumHashSize. This change is intentional and clarifies that the quorum hash is being validated with a dedicated constant. Note that in crypto/crypto.go, QuorumHashSize is defined as equal to crypto.DefaultHashSize (32), so the change does not alter the effective size check but improves semantic clarity for quorum-specific code.

  • types/quorum_sign_data.go now validates using crypto.QuorumHashSize.
  • crypto/crypto.go shows that QuorumHashSize is assigned the same value as DefaultHashSize.
  • Other parts of the code continue to use DefaultHashSize where appropriate, so the change is localized and intentional.
internal/proxy/client.go (1)

215-218: Implementation follows proxy patterns correctly.

The FinalizeSnapshot method is properly implemented with metrics timing and follows the established proxy patterns.

abci/client/grpc_client.go (1)

258-260: Implementation follows gRPC patterns correctly.

The FinalizeSnapshot method is properly implemented with WaitForReady and follows the established gRPC patterns.

internal/consensus/state.go (1)

310-312: LGTM! Clean implementation of height accessor.

The GetCurrentHeight method provides a clean and thread-safe way to access the current height from the state data store.

abci/client/routed_client.go (1)

354-357: LGTM! Consistent implementation of ABCI method.

The FinalizeSnapshot implementation follows the established pattern for ABCI methods in the routed client, properly handling delegation and response type casting.

abci/client/socket_client.go (2)

360-366: LGTM! Consistent implementation of ABCI method.

The FinalizeSnapshot implementation follows the established pattern for ABCI methods in the socket client, properly handling request/response flow and error cases.


440-441: LGTM! Proper request/response type matching.

The addition of FinalizeSnapshot case in resMatchesReq ensures proper type matching for the new message type.

internal/p2p/transport_mconn.go (1)

187-187: Verify impact of default port change.

The default port has been changed from 26657 to 26656. This change could affect network connectivity for existing deployments.

Please confirm:

  1. Is this change intentional and documented?
  2. Has this been communicated to node operators?
  3. Are there any migration steps needed?
abci/types/mocks/application.go (1)

261-318: LGTM! Auto-generated mock implementation.

The new FinalizeSnapshot mock method and related types follow the established pattern for mock implementations.

internal/statesync/reactor_test.go (2)

241-242: LGTM! Mock expectation for FinalizeSnapshot.

The mock expectation correctly verifies that FinalizeSnapshot is called exactly once.


772-786: LGTM! Helper function for retry logic.

The retryUntil helper function provides a clean way to handle retries in tests.

abci/client/mocks/client.go (1)

365-422: LGTM! The FinalizeSnapshot mock implementation is well-structured.

The implementation follows the established pattern in the codebase, with proper error handling, type assertions, and the standard mock method structure.

config/config.go (2)

1006-1011: Well-documented retry configuration.

The documentation clearly explains the retry behavior and its implications, particularly noting the pessimistic case timing with discovery-time.


1065-1067: LGTM! Proper validation for the new retries field.

The validation ensures retries cannot be negative, maintaining configuration integrity.

.mockery.yaml (1)

56-56: LGTM! ConsensusStateProvider interface added correctly.

The addition follows the file's structure and maintains proper organization.

test/e2e/networks/rotate.toml (2)

155-155: LGTM! Enabled P2P state sync for validator06.

The configuration aligns with the new state sync implementation.


195-195: LGTM! Switched full01 to P2P state sync.

The change maintains consistency with the new state sync approach across different node types.

test/e2e/pkg/mockcoreserver/core_server.go (1)

149-150: Simplified Result Assignment in QuorumVerify
The inline initialization of the btcjson.QuorumVerifyResult with the computed signatureVerified is clear and succinct. Please verify that this simplification preserves all intended behavior from the previous implementation.

internal/consensus/state_data.go (1)

122-122: Minor Comment Correction
The typographical correction in the comment (changing "nad" to "and") improves readability without affecting functionality.

go.mod (3)

12-13: Dependency Update – dashd-go and btcec/v2:
The versions for “github.com/dashpay/dashd-go” (updated to v0.26.1) and its indirect dependency “github.com/dashpay/dashd-go/btcec/v2” (updated to v2.2.0) have been bumped. Please ensure that these updates are fully compatible with the new FinalizeSnapshot functionality (and any other state sync-related features) across the codebase.


84-84: Stable Version Upgrade – btclog:
The upgrade from a pseudo-version to “github.com/btcsuite/btclog v1.0.0” is a positive change for stability. Verify that no API conflicts occur due to this change.


112-112: Dependency Bump – btcutil:
Updating “github.com/dashpay/dashd-go/btcutil” to v1.3.0 aligns with the other dependency updates. Please double-check that any API changes in this library are handled appropriately in the FinalizeSnapshot and related implementations.

proto/tendermint/abci/types.proto (4)

35-35: FinalizeSnapshot Field Integration:
A new field finalize_snapshot (tag 20) has been added to the oneof block in the Request message. This integration is clear; just verify that tag 20 does not conflict with any reserved or previously used tags.


193-198: New RequestFinalizeSnapshot Message:
The new message correctly defines two fields—snapshot_block and genesis_block—using the LightBlock type. This design meets the PR objective to allow state validation upon snapshot finalization. Please ensure that the application’s validation logic uses these fields appropriately.


531-532: FinalizeSnapshot Response in Oneof:
The addition of the finalize_snapshot field (with tag 17) in the Response oneof is consistent with our design: the response remains empty so that any errors are returned as RPC errors rather than via response fields.


647-650: New ResponseFinalizeSnapshot Message:
The newly introduced response message is intentionally empty, matching our consensus that in case of validation issues the ABCI application should return an error directly. This is aligned with previous design discussions and learnings.

spec/abci++/api.md (2)

21-22: TOC Update – RequestFinalizeSnapshot Entry:
The Table of Contents has been updated appropriately to include a link to the new RequestFinalizeSnapshot section. This helps users quickly locate the new functionality in the documentation.


39-40: TOC Update – ResponseFinalizeSnapshot Entry:
Similarly, adding the ResponseFinalizeSnapshot entry to the TOC provides a complete view of the new message types added.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
libs/math/safemath.go (2)

99-117: Enhance function documentation.

While the functions are well-implemented, their documentation could be more descriptive. Consider adding:

  • Parameter descriptions
  • Return value descriptions
  • Example usage

Apply this diff to improve the documentation:

-// SafeConvertUint64 takes a int and checks if it overflows.
+// SafeConvertUint64 safely converts a value of type T to uint64.
+//
+// Parameters:
+//   - a: The value to convert
+//
+// Returns:
+//   - uint64: The converted value
+//   - error: ErrOverflow if the conversion would overflow
+//
+// Example:
+//   val, err := SafeConvertUint64(int32(42))

262-311: Consider enhancing error handling in Max/Min functions.

While the implementation is solid, consider replacing panics with errors for better error handling in production code.

Apply this diff to improve error handling:

-func Max[T Integer]() uint64 {
+func Max[T Integer]() (uint64, error) {
 	var max T
 	switch any(max).(type) {
 	// ... existing cases ...
 	default:
-		panic("unsupported type")
+		return 0, fmt.Errorf("unsupported type %T", max)
 	}
 }

-func Min[T Integer]() int64 {
+func Min[T Integer]() (int64, error) {
 	switch any(T(0)).(type) {
 	// ... existing cases ...
 	default:
-		panic("unsupported type")
+		return 0, fmt.Errorf("unsupported type %T", T(0))
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae0b843 and a4de86a.

📒 Files selected for processing (1)
  • libs/math/safemath.go (3 hunks)
🧰 Additional context used
📓 Learnings (1)
libs/math/safemath.go (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: libs/math/safemath.go:123-161
Timestamp: 2025-01-20T13:38:10.707Z
Learning: The `SafeConvert` function in `libs/math/safemath.go` correctly handles negative values during integer type conversion through two mechanisms: explicit checks for int64 to uint64 conversion, and the Min[U]() check that catches negative values when converting to unsigned types.
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: test_abci_cli
  • GitHub Check: test_apps
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: e2e-test (dashcore)
🔇 Additional comments (3)
libs/math/safemath.go (3)

11-11: LGTM! Error variables are well-defined.

The new error variables follow the established pattern and provide appropriate granularity for error handling.

Also applies to: 15-15


123-161: LGTM! Robust implementation with comprehensive edge case handling.

The implementation correctly handles:

  • Special cases for int64/uint64 conversions
  • Platform-specific integer sizes
  • Negative value conversions
  • Overflow checks using Max/Min bounds

163-198: LGTM! Well-implemented panic wrappers.

The MustConvert functions provide a clean API for cases where panics are acceptable, and the error messages are now correctly implemented using a zero value of the target type.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
types/validator_test.go (3)

3-14: Remove unused import.

The fmt package is imported but not used in the code.

-	"fmt"

118-119: Fix typo in function name.

The function name contains a typo: "Valdiator" should be "Validator".

-func TestValdiatorSetHashVectors(t *testing.T) {
+func TestValidatorSetHashVectors(t *testing.T) {

118-135: Enhance test documentation and coverage.

The test uses hardcoded test vectors without documenting their source or significance. Consider:

  1. Adding comments explaining the source/significance of the test vectors
  2. Using test table pattern for multiple test cases
  3. Adding comments explaining the expected hash calculation

Here's a suggested refactor:

-// TestValdiatorSetHashVectors checks if provided validator threshold pubkey and quorum hash returns expected hash
-func TestValdiatorSetHashVectors(t *testing.T) {
+// TestValidatorSetHashVectors verifies the ValidatorSet.Hash() computation using known test vectors.
+// These test vectors were generated from [add source/reference here].
+func TestValidatorSetHashVectors(t *testing.T) {
+	testCases := []struct {
+		name               string
+		thresholdPubKeyB64 string
+		quorumHashHex      string
+		expectedHashHex    string
+	}{
+		{
+			name:               "known good vector",
+			thresholdPubKeyB64: "gw5F5F5kFNnWFUc8woFOaxccUI+cd+ixaSS3RZT2HJlWpvoWM16YRn6sjYvbdtGH",
+			quorumHashHex:      "703ee5bfc78765cc9e151d8dd84e30e196ababa83ac6cbdee31a88a46bba81b9",
+			expectedHashHex:    "81742F95E99EAE96ABC727FE792CECB4996205DE6BFC88AFEE1F60B96BC648B2",
+		},
+		// Add more test vectors here
+	}
+
+	for _, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
+			thresholdPublicKey, err := base64.RawStdEncoding.DecodeString(tc.thresholdPubKeyB64)
+			require.NoError(t, err)
+
+			quorumHash, err := hex.DecodeString(tc.quorumHashHex)
+			require.NoError(t, err)
+
+			valset := ValidatorSet{
+				ThresholdPublicKey: bls12381.PubKey(thresholdPublicKey),
+				QuorumHash:         quorumHash,
+			}
+
+			hash := valset.Hash()
+			assert.Equal(t, tc.expectedHashHex, hash.String())
+		})
+	}
internal/libs/sync/mutexguard_test.go (1)

104-141: Consider making sleep durations configurable.

While the test effectively simulates real-world scenarios with processing delays, fixed sleep durations can make tests flaky on different systems or under load.

Consider making the sleep durations configurable or using channels/contexts for more deterministic timing:

-		time.Sleep(10 * time.Millisecond) // Simulate read delay
+		time.Sleep(testReadDelay) // Simulate read delay
-		time.Sleep(10 * time.Millisecond) // Simulate write delay
+		time.Sleep(testWriteDelay) // Simulate write delay

Where testReadDelay and testWriteDelay are configurable durations, possibly passed as test parameters or defined as constants that can be adjusted based on the test environment.

internal/statesync/stateprovider.go (1)

274-279: Consider reducing code duplication.

The LightBlock implementation is identical in both stateProviderRPC and stateProviderP2P.

Consider extracting the common implementation to a shared helper function or base struct:

+func getVerifiedLightBlock(s interface {
+    sync.Locker
+    verifyLightBlockAtHeight(ctx context.Context, height uint64, ts time.Time) (*types.LightBlock, error)
+}, ctx context.Context, height uint64) (*types.LightBlock, error) {
+    s.Lock()
+    defer s.Unlock()
+    return s.verifyLightBlockAtHeight(ctx, height, time.Now())
+}

 func (s *stateProviderP2P) LightBlock(ctx context.Context, height uint64) (*types.LightBlock, error) {
-    s.Lock()
-    defer s.Unlock()
-    return s.verifyLightBlockAtHeight(ctx, height, time.Now())
+    return getVerifiedLightBlock(s, ctx, height)
 }
internal/statesync/syncer.go (1)

552-556: Enhance error message with context.

The error message could be more descriptive to help with debugging.

-			s.logger.Error("failed to request snapshot chunk", "err", err, "chunkID", ID)
+			s.logger.Error("failed to request snapshot chunk from peer", 
+				"err", err, 
+				"chunkID", ID,
+				"height", snapshot.Height,
+				"version", snapshot.Version)
internal/statesync/reactor_test.go (1)

770-784: Consider adding backoff to retry logic.

While the retry function works, it could be more efficient with exponential backoff.

 func retryUntil(ctx context.Context, t *testing.T, fn func() bool, timeout time.Duration) {
 	t.Helper()
 
 	ctx, cancel := context.WithTimeout(ctx, timeout)
 	defer cancel()
 
+	backoff := time.Millisecond
 	for {
 		if fn() {
 			return
 		}
 		require.NoError(t, ctx.Err())
+		time.Sleep(backoff)
+		backoff = time.Duration(float64(backoff) * 1.5) // exponential backoff
+		if backoff > time.Second {
+			backoff = time.Second // cap maximum backoff
+		}
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4de86a and c46ebe1.

📒 Files selected for processing (6)
  • internal/libs/sync/mutexguard_test.go (1 hunks)
  • internal/statesync/reactor.go (19 hunks)
  • internal/statesync/reactor_test.go (16 hunks)
  • internal/statesync/stateprovider.go (6 hunks)
  • internal/statesync/syncer.go (13 hunks)
  • types/validator_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/statesync/reactor.go
🧰 Additional context used
📓 Learnings (1)
internal/statesync/syncer.go (1)
Learnt from: lklimek
PR: dashpay/tenderdash#1026
File: internal/statesync/syncer.go:380-384
Timestamp: 2025-01-24T08:11:01.354Z
Learning: For genesis height conversion from int64 to uint64, using MustConvertUint64 is preferred over error handling since the height is already validated to be positive and cannot exceed uint64 range.
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: test_abci_cli
  • GitHub Check: test_apps
  • GitHub Check: tests (04)
  • GitHub Check: tests (02)
  • GitHub Check: tests (01)
  • GitHub Check: e2e-test (rotate)
  • GitHub Check: Super linter
  • GitHub Check: e2e-test (dashcore)
🔇 Additional comments (8)
internal/libs/sync/mutexguard_test.go (3)

14-42: Well-structured test for lock guard behavior!

The test thoroughly verifies the LockGuard's behavior with multiple unlock attempts, including proper cleanup of deadlock detection settings.


44-65: Excellent concurrency test design!

The test effectively verifies thread safety using multiple goroutines and proper synchronization with WaitGroup.


67-102: Comprehensive test for read/write lock behavior!

The test effectively verifies RWMutex functionality with a good mix of concurrent read and write operations.

internal/statesync/stateprovider.go (2)

43-44: Clean interface extension!

The new LightBlock method follows the same pattern as existing methods in the interface.


129-134: Clean implementation of LightBlock method!

The implementation follows the same pattern as other methods, properly handling locking and reusing existing verification logic.

internal/statesync/syncer.go (2)

607-634: Well-structured finalization logic!

The finalizeSnapshot implementation includes proper null checks, error handling, and clear logging.


379-387: Appropriate use of MustConvertUint64!

The code correctly validates initial height and uses MustConvertUint64 as the height is already validated to be positive.

internal/statesync/reactor_test.go (1)

241-242: Good test coverage for FinalizeSnapshot!

The mock expectations properly verify the new finalization functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In review / testing
Development

Successfully merging this pull request may close these issues.

1 participant